Cavaillé and Ferwerda (Cavaillé and Ferwerda 2023) examine the effect of granting immigrants access to the welfare state in Austria during 2006 on the support for far-right parties. In their preferred specification, they find that – the interaction of the proportion of non-eu residents in a municipality with the proportion of population in that municipality has an effect on the change in vote share for far right parties of +0.67 (p < .05) supporting the main substantive claim that increasing welfare benefits to immigrants increases support for far-right parties. Moderating this, based on a subset of data from Vienna, they find that – percentage of population in rental housing and the percentage of population in public housing are both substantive and significant predictors of change in far-right party vote share with magnitude of .03 and .09 (respectively) and p-values of < .05. Based on this estmate they conclude that support was increased by the the high proportion of public housing beneficiaries or low-end rentals within districts.
We first reproduce the paper’s main and secondary analyses, and subject these to computational robustness tests.
We then conduct several types of conceptual replication – examining the robustness of results to selection of covariates, reweighting, and the sensitivity of results to resampling and outliers.
We find research is computationally replicable but the evidence for the main causal claims are overstated as it sensitive to a small set of observations, does not explain most of the variation in outcomes, and does not out-compete simpler causal explanations.
Introduction
Cavaillé and Ferwerda (Cavaillé and Ferwerda 2023) tested the effect of granting immigrants access to the welfare state in Austria during 2006 on the support for far-right parties. We focus on their two main claims, as described in the abstract (pg 19):
we show that the reform increased support for far-right parties with welfare chauvinist platforms. Electoral ward data suggest that this response was concentrated in districts with a high proportion of public housing beneficiaries or low-end rentals. Our findings provide novel evidence that distributional conflict can accelerate the rise of far-right parties in countries with substantial in-kind welfare programs
In their preferred specification, using OLS with clustered standard errors, they find that – the interaction of the proportion of non-eu residents in a municipality with the proportion of population in that municipality has an effect on the change in vote share for far right parties of +0.67 (p < .05) (pg 26, Table 1, col 1) supporting the main substantive claim that increasing welfare benefits to immigrants increases support for far-right parties (pg. 19). Moderating this, based on a subset of data from Vienna, they find that – percentage of population in rental housing and the percentage of population in public housing are both substantive and significant predictors of change in far-right party vote share with magnitude of .03 and .09 (respectively) and p-values of < .05. (pg 30, Table 2, col 1). Based on this estimate they conclude that support was accelerated by the the high proportion of public housing beneficiaries or low-end rentals within districts. (pg 19.)
We obtained the published replication data set (Cavaille 2023) from the Harvard Dataverse archive, where it had been deposited for public use by the authors. The replication data set included the final processed data used to produce the paper results, along with R code to replicate figures and tables. The data was sufficient to check computational repoducibility but posed challenges for conceptual reproducibility because it provided neither copies of the source data, nor links to or citations of the original data sources (which were described generally). Further although the replication data did include additional measures not used in publication, these were largely undocumented, and hence difficult to reliably interpret.
We first reproduced the paper’s main and secondary analyses, and subject these to computational robustness tests. The computational reproductions showed both the main and secondary results to be reproducible. As an unintended consequence reproducing the results revealed the substantively poor fit of these models: the primary model has an R-squared of .0356. This finding informed further replication analysis.
We then conducted several types of conceptual replication – examining the robustness of results to selection of covariates, reweighting, and the sensitivity of results to resampling and outliers.
Prompted by the unexectedly poor fit of the primary model, we performed a conceptual replication that applied the authors linear model with clustered errors with alternate covariates, and to models that did not include the interaction effects that are central to the papers proposed causal explanation. This reveals that the interaction term does not provide substantial improvement in explanatory power over a more naive baseline.
The authors suggest, in their first analysis of nation-wide election data, that voters in municipalities with high proportions of residents living in public housing as well as comparatively high proportions of third-country nationals voted more for far-right parties as a result of the demand shock on public housing. We undertake a conceptual replication of this result using their data on the city of Vienna. In particular, we look at census tracts—as a proxy for neighborhood—to see if tracts with a higher proportion of third-country nationals and higher proportions of individuals living in public housing tend to vote for far-right parties at higher rates. We find that implementing this conceptual replication increases the magnitude of the main point estimate for the interaction between these two factors, but the estimate is no longer statistically significant at the 5% level.
The authors fit the models in both of their main analyses at the level of administrative units—i.e., all municipalities are equally weighted in their nation-wide analysis, and all tracts are equally weighted in their analysis of Vienna. However, it is unclear if this weighting is substantively appropriate. To test the robustness of their conclusions to this, we reweight their model so that each administrative unit has weight equal to the number of voters it contains. Implementing this robustness check in their primary analysis increases the magnitude of their main point estimate for the the interaction of the proportion of residents living in public housing with the proportion of third-country nationals residing in a municipality, which remains significant at the 5% level. Implementing this robustness check in their secondary analysis has no effect on the magnitude or the statistical significance of the main point estimates.
Next, to better understand the authors main claims, how well their models fit the data, and to search for potential anomalies not evident in low-dimensional summaries, we attempted to visually replicate their primary and secondary analyses. Based on this graphical exploration, we carried out an outlier analysis.
We assessed the robustness of the findings from the Austrian sample by examining potential outliers in the key variables. We observed that the previously significant interaction effect between the share of non-EU residents and the share of public housing loses its significance when we exclude abnormally large values of key variables. Covariate valance analysis highlights substantial differences between observations with outliers and those without. When we reevaluate the main argument using the sample of outliers, we find that all the initial effects become notably stronger in both magnitude and significance. This may suggest that the author’s primary argument may not generalize to all Austrian districts within the sample.
Finally, given the substantive importance placed on the interaction term in the authors’ primary analysis, as well as the generally low goodness-of-fit of the models considered, we evaluated the main models of the primary and secondary analyses using out-of-sample tests of predictive accuracy. We find in both cases that the models with and without the interaction term have essentially the same mean squared error, suggesting that the substantive interpretation of the interaction effect may not be appropriate.
Reproducibility
Simple Direct Reproduction (Calibration)
As a baseline check of the model, and our understanding of it we conducted a simple computational replication, using the authors’ supplied data and code, and compared these to published results.
Reproduction of Table 1 - Model 1 (Primary Result)
Loading required namespace: haven
Loading required namespace: estimatr
term
estimate
std.error
statistic
p.value
conf.low
conf.high
df
outcome
comp
(Intercept)
0.03684645
0.0009130496
40.3553683
1.803168e-271
0.0350559924
0.03863691
2369
d_rr_06
dv_pop_01
0.02422834
0.0121140293
2.0000234
4.561171e-02
0.0004731441
0.04798354
2369
d_rr_06
pct_noneu_06
-0.02263235
0.0305294844
-0.7413275
4.585684e-01
-0.0824996238
0.03723493
2369
d_rr_06
dv_pop_01:pct_noneu_06
0.66842772
0.1658860281
4.0294395
5.767257e-05
0.3431308814
0.99372456
2369
d_rr_06
original
(Intercept)
0.04000000
0.0000000000
NA
5.000000e-02
NA
NA
NA
NA
dv_pop_01
0.02000000
0.0300000000
NA
1.000000e+00
NA
NA
NA
NA
pct_noneu_06
-0.02000000
0.0300000000
NA
5.000000e-02
NA
NA
NA
NA
dv_pop_01:pct_noneu_06
0.67000000
0.1700000000
NA
5.000000e-02
NA
NA
NA
NA
r.squared
adj.r.squared
df.residual
res_var
nobs
comp
0.03560054
0.03437926
2369
0.000802094
2373
Reproduction of Table 2 - Model 1 (Secondary Result)
Warning in eval(quote({: Some observations have missingness in the cluster
variable(s) but not in the outcome or covariates. These observations have been
dropped.
The computational reproductions showed both the main and secondary results to be reproducible.
As an unintended consequence, reproducing the results revealed the substantively poor fit of these models: the primary model has an R-squared of .0356. This finding informed the replication analysis.
Computational robustness
Generally, even with a fixed statistical model family and specification results may vary with the estimation algorithm used and specific software’s implementation of it. (Altman, Gill, and McDonald 2004) We evaluate the computational robustness of the model by using alternative algorithms and implementations.
Primary Result
Code
requireNamespace("arm")
Loading required namespace: arm
Code
results_m1_statlm.lm <- stats::lm(m1.formula, data = austria_authors.df)results_m1_statglm.glm <- stats::glm(m1.formula, data = austria_authors.df)results_m1_statbayeslm.glm <- arm::bayesglm(m1.formula, data = austria_authors.df, prior.scale=Inf, prior.df=Inf)#Note: could also add nls, bayesglm, mle2 -- would require re-expressing current formula in different syntax, and renaming results matrixml.ls <-list(lmrobust = results_m1_computational.lmr,lm=results_m1_statlm.lm,glm=results_m1_statglm.glm,bayes=results_m1_statbayeslm.glm)alt_est_m1.df <-tidy_results(ml.ls)
Warning: The `tidy()` method for objects of class `bayesglm` is not maintained by the broom team, and is only supported through the `glm` tidier method. Please be cautious in interpreting and reporting broom output.
This warning is displayed once per session.
requireNamespace("arm")results_m2_statlm.lm <- stats::lm(m2.formula, data = vienna_authors.df)results_m2_statglm.glm <- stats::glm(m2.formula, data = vienna_authors.df)results_m2_statbayeslm.glm <- arm::bayesglm(m2.formula, data = vienna_authors.df, prior.scale=Inf, prior.df=Inf)#Note: could also add nls, bayesglm, mle2 -- would require re-expressing current formula in different syntax, and renaming results matrixml.ls <-list(lmrobust = results_m2_computational.lmr,lm=results_m2_statlm.lm,glm=results_m2_statglm.glm,bayes=results_m2_statbayeslm.glm)alt_est_m2.df <-tidy_results(ml.ls)alt_est_m2_summary.df <-tidy_summary(ml.ls)alt_est_m2.df %>%group_by(repl_id) %>%gt()
Summary - Computation Robustness of Reproducibility
The computational reproductions showed both the main and secondary results to be robust to alternative choices of algorithm and software implementation.
Prompted by the unexectedly poor fit of the primary model, we performed a conceptual replication that applied the authors linear model with clustered errors with alternate covariates, and to models that did not include the interaction effects that are central to the papers proposed causal explanation.
Observe that none of the alternative models fit the data well. Moreover, models without interaction effects provide only slightly lower explanatory power.
Warning in eval(quote({: Some observations have missingness in the cluster
variable(s) but not in the outcome or covariates. These observations have been
dropped.
Warning in eval(quote({: Some observations have missingness in the cluster
variable(s) but not in the outcome or covariates. These observations have been
dropped.
Warning in eval(quote({: Some observations have missingness in the cluster
variable(s) but not in the outcome or covariates. These observations have been
dropped.
Observe as withthe primary finding that none of the alternative models fit the data well, and that models without interaction effects provide only slightly lower explanatory power. Further, when the orignal model applied to the country as a whole (as above) for the main findingt is fit to this subset, the interaction term is not significant.
Proportion of third-country nationals
In light of the eye tests below, we investigate the counterintuitive relationship between the proportion of third-country nationals living in a particular ward and the change in far-right vote share. In particular, if the mechanism driving change support for far-right parties is direct competition between third-country and Austrian nationals for public housing, then, assuming individuals have a preference for continuing to live near their current residence, we would expect that voters living in areas with more third-country nationals would tend to be more supportive of far-right parties.
It’s possible that these results are driven by homophily, i.e., voters residing in tracts with higher percentages of third-country nationals have political preferences that are more friendly to third-country nationals. To test this, we see if voters living in areas with few third-country nationals were also more likely to vote for far-right parties in 2002. To ensure that the controls make sense, we reverse-engineer pctforeign02.
We find the opposite—namely, that the percentage of third-country nationals is, if anything, weakly positively associated with support for far-right parties in previous elections, which casts doubt on the hypothesis that far-right voters are residentially segregated from third-country nationals.
This suggests that the evidence from Vienna specifically for the proposed mechanism—namely, that direct competition with third-country nationals for public housing resources pushes voters to support far-right parties—is weak.
Reweighting
If we had individual rather than aggregate data, we would most likely fit a model at the voter level, regressing whether or not an individual cast a vote for a far-right party in the 2006 federal election against whether they lived in public housing, the number of third-country nationals living in their neighborhood, etc. This paper uses aggregate data at the tract and municipality level as an approximation to the individual data. However, unless we weight the regression by the number of voters in a municipality, we are implicitly weighting voters differently in different municipalities or tracts. As a result, we rerun both models reweighting the observations by the number of voters in the administrative unit.
We see that both effects become more pronounced—first, that municipalities with higher numbers of third-country residents are substantially less likely to vote for far-right parties, and second, that municipalities with high proportions of people living in public housing are more likely to vote for far-right parties.
We can repeat the same analysis at the level of Vienna tracts for the second analysis.
Warning in eval(quote({: Some observations have missingness in the cluster
variable(s) but not in the outcome or covariates. These observations have been
dropped.
In this case, the weighting makes almost no difference to the model results.
We can also perform our alternative specification, where we adjust for the percentage of third-country residents in the Vienna analysis, with reweighting.
Again, this doesn’t seem to meaningfully affect the results.
Data Robustness Analysis
Graphical Exploration of Data (intraoccular impact)
To better understand the authors main claims, we attempt to visualize the trends they find in the raw data. We begin with the first analysis, which links support for far-right political parties in the 2006 federal election with the proportion of third-country nationals across Austrian municipalities.
Code
requireNamespace("plotly")
Loading required namespace: plotly
Code
## AUSTRIA# Plot the relationship between % non-EU and change in vote sharesuppressWarnings({ austria_authors.df %>%ggplot(aes(x = pct_noneu_06, y = d_rr_06)) +geom_point() +labs(x ="% non-EU residents in municipality",y ="Change in far-right vote share") +# NOTE: There are ~200 municipalities with 0 non-EU residentsscale_x_log10() +geom_smooth(method = lm)} %>% plotly::ggplotly())
`geom_smooth()` using formula = 'y ~ x'
Code
# Plot the relationship between % non-EU and vote sharesuppressWarnings({ austria_authors.df %>%ggplot(aes(x = pct_noneu_06, y = rr_share_06)) +geom_point() +labs(x ="% non-EU residents in municipality",y ="Far-right vote share") +# NOTE: There are ~200 municipalities with 0 non-EU residentsscale_x_log10() +geom_smooth(method = lm)} %>% plotly::ggplotly())
`geom_smooth()` using formula = 'y ~ x'
Code
# Compare vote share in 2002 and 2006suppressWarnings({ austria_authors.df %>%select(pct_noneu_06, rr_share_06, rr_share_02) %>%pivot_longer(cols =starts_with("rr_share"),names_prefix ="rr_share_",names_to ="year",values_to ="vote_share" ) %>%ggplot(aes(x = pct_noneu_06, y = vote_share)) +geom_point() +labs(x ="% non-EU residents in municipality",y ="Far-right vote share") +# NOTE: There are ~200 municipalities with 0 non-EU residentsscale_x_log10() +geom_smooth(method = lm) +facet_wrap(vars(year))} %>% plotly::ggplotly())
`geom_smooth()` using formula = 'y ~ x'
Visual examination of the data indicates that (1) there is a very modest association between the percentage of third-country nationals and both the level of support and the change in the level of support for far-right parties in the 2006 elections; and (2) that this trend is more pronounced in 2006 than in 2002. One of this article’s primary interests is, in addition, the interaction between these two factors—i.e., that support is driven by competition between Austrian and third-country nationals for housing, which is most accute when public housing rates and the proportion of the population that is third-country nationals are both high. To visualize this, we plot the relationship between the percentage of third-country nationals and support for far-right parties, stratifying by the percentage of the population that lives in public housing.
Code
# Bin the percentage of the municipality in public housing and plot the# change in vote share by % non-EU residentscuts <-with(austria_authors.df,quantile(dv_pop_01, probs =seq(0, 1, 1/3),na.rm =TRUE))suppressWarnings({ austria_authors.df %>%mutate(pct_public_housing =cut(dv_pop_01, breaks = cuts, include.lowest =TRUE)) %>%# ~ 13 municipalities don't have data on public housingdrop_na(pct_public_housing) %>%ggplot(aes(x = pct_noneu_06, y = d_rr_06)) +geom_point() +labs(x ="% non-EU residents in municipality",y ="Change in far-right vote share") +# NOTE: There are ~200 municipalities with 0 non-EU residentsscale_x_log10() +geom_smooth(method = lm) +facet_wrap(vars(pct_public_housing))} %>% plotly::ggplotly())
`geom_smooth()` using formula = 'y ~ x'
We see that while the trend line does appear to get steeper, it is estimated with a considerable amount of uncertainty; in particular, in all three cases, the trend is not visually apparent, and the slope of the fitted linear model has a high degree of uncertainty about sign.
We additionally color the observations based on the baseline level of support for far-right parties to see if an interaction with public housing rates and proportion of third-country nationals is apparent. In particular, it seems plausible that these relationships might be strengthened in places where the baseline support for far-right parties (as measured by their support in 2002) is already high.
Code
# Add coloring based on how far-right the municipality was in the previous# electionsuppressWarnings({ austria_authors.df %>%mutate(pct_public_housing =cut(dv_pop_01, breaks = cuts, include.lowest =TRUE)) %>%# ~ 13 municipalities don't have data on public housingdrop_na(pct_public_housing) %>%ggplot(aes(x = pct_noneu_06, y = d_rr_06, color = rr_share_02)) +geom_point() +labs(x ="% non-EU residents in municipality",y ="Change in far-right vote share",color ="% far-right in\nprevious election") +# NOTE: There are ~200 municipalities with 0 non-EU residentsscale_x_log10() +scale_color_gradient(low ="blue", high ="red") +geom_smooth(method = lm) +facet_wrap(vars(pct_public_housing))} %>% plotly::ggplotly())
`geom_smooth()` using formula = 'y ~ x'
While some extreme outliers seem to be in places that were already far-right in 2002, no trend is immediately apparent.
Next, we consider the second analysis, which tries to more directly test whether direct competition for public housing between Austrian and third-country nationals explains increasing support for far-right parties in Vienna.
We begin by visualizing the relationship between the proportion of third-country nationals living in a tract, the proportion of residents living in public housing in a tract, and change in far-right vote share.
Code
## VIENNA# Plot the relationship between % non-EU and change in vote sharesuppressWarnings({ vienna_authors.df %>%ggplot(aes(x = pctforeign, y = dv)) +geom_point() +labs(x ="% non-EU residents in tract",y ="Change in far-right vote share") +# NOTE: There are ~15 tracts with no foreign residentsscale_x_log10() +geom_smooth(method = lm)} %>% plotly::ggplotly() )
`geom_smooth()` using formula = 'y ~ x'
Code
# Plot the relationship between % in public housing and change in vote sharesuppressWarnings({ vienna_authors.df %>%ggplot(aes(x = pctpublic_w_zsp, y = dv)) +geom_point() +labs(x ="% of residents in public housing",y ="Change in far-right vote share") +# NOTE: There are ~900 tracts with no one in public housingscale_x_log10() +geom_smooth(method = lm) } %>% plotly::ggplotly())
`geom_smooth()` using formula = 'y ~ x'
We notice two important facts. First, the relationship between the percentage of individuals living in public housing and the change in far-right vote share is much more pronounced and positive in Vienna than in the national data. Second, the relationship between the percentage of third-country nationals and the far-right vote share is actually very pronounced and negative. To understand the interrelationship between these two factors, we stratify by the proportion of individuals living in public housing and plot the relationship between the percentage of third-country nationals and change in far-right vote share, and see that while the rate of support does seem to rise in the highest bin (i.e., those tracts where the largest number of people live in public housing), the trend within each bin remains fairly negative.
Code
# Plot the relationship between % non-EU and change in vote share, stratified by# the rate of public housingcuts <-with(vienna_authors.df,quantile( pctpublic_w_zsp,probs =c(0, 1/2, 3/4, 1),na.rm =TRUE ))suppressWarnings({ vienna_authors.df %>%mutate(public_housing =cut(pctpublic_w_zsp, cuts, include.lowest =TRUE)) %>%ggplot(aes(x = pctforeign, y = dv)) +geom_point() +labs(x ="% non-EU residents in tract",y ="Change in far-right vote share") +# NOTE: There are ~15 tracts with no foreign residentsscale_x_log10() +geom_smooth(method = lm) +facet_wrap(vars(public_housing))} %>% plotly::ggplotly())
`geom_smooth()` using formula = 'y ~ x'
Code
rm(cuts)
Outlier analysis
First of all, we can have a look at the outliers in the main variables from Table 1 (Austrian sample). The following code visualizes the relationships between the main variables of interest: the percentage of non-EU residents, the percentage of people living in public housing, and the change in far-right vote share. From each of these variables, we drop 1% of the largest and smallest values. And then the plots compare the change in linear dependence between the variables.
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
Code
rm(p1,p2,p3)
From these plots, we can observe that after removing outliers, the R-squared values decrease significantly. Respectively, it drops from 0.028 to 0.029 (a 32% decrease), from 0.013 to 0.010 (a 23% decrease), and from 0.008 to 0.003 for the key interaction (a 62.5% decrease).
Now, we replicate the baseline model from Table 1 (regression of far-right vote change on public housing, non-EU and residents and their interaction).
Code
library("lfe")
Loading required package: Matrix
Attaching package: 'Matrix'
The following objects are masked from 'package:tidyr':
expand, pack, unpack
We can see that the initial baseline model is reproducible in terms of effect sizes and standard errors. But once we change subsamples, results change as well. We can see that the exclusion of 1% highest values for all three key variables is related to the change in the estimate for the interaction term: coefficient drops to 0.47 while p-value = 0.091 compared to the p-value <0.001 in the baseline model.
Next, we try to cluster errors by district (bezirk). For the initial sample, the result does not change drastically (the p-value for the interaction term increases to 0.006). Yet, the significance of the term when the highest values are dropped reach even higher p = 0.2).
Arguments `vcov` and `vcov_args` have not been specified in `tidy_robust()`.
Specify at least one to obtain robust standard errors.
tidy_robust(): Robust estimation with
`parameters::model_parameters(model = x, ci = 0.95, robust = "HC1")`
sqe1 <-double(nrow(austria_authors.df))for (i inseq(nrow(austria_authors.df))) { row <- austria_authors.df[i, ] m <-lm(d_rr_06 ~ dv_pop_01 * pct_noneu_06, data = austria_authors.df[-i, ]) sqe1[[i]] <- (row$d_rr_06 -predict(m, row))^2}print(glue::glue("LOO RMSE for interaction mode: { sqrt(mean(sqe1, na.rm = TRUE)) }"))
LOO RMSE for interaction mode: 0.0283408162629289
Code
sqe2 <-double(nrow(austria_authors.df))for (i inseq(nrow(austria_authors.df))) { row <- austria_authors.df[i, ] m <-lm(d_rr_06 ~ dv_pop_01 + pct_noneu_06, data = austria_authors.df[-i, ]) sqe2[[i]] <- (row$d_rr_06 -predict(m, row))^2}print(glue::glue("LOO RMSE for main effects: { sqrt(mean(sqe2, na.rm = TRUE)) }"))
LOO RMSE for main effects: 0.0284132384077463
Code
rm(sqe1)rm(sqe2)
Code
sqe1 <-double(nrow(vienna_authors.df))for (i inseq(nrow(vienna_authors.df))) { row <- vienna_authors.df[i, ] m <-lm(dv ~ pctrental * pctpublic_w_zsp, data = vienna_authors.df[-i, ]) sqe1[[i]] <- (row$dv -predict(m, row))^2}print(glue::glue("LOO RMSE for interaction mode: { sqrt(mean(sqe1, na.rm = TRUE)) }"))
LOO RMSE for interaction mode: 0.0431238824470698
Code
sqe2 <-double(nrow(vienna_authors.df))for (i inseq(nrow(vienna_authors.df))) { row <- vienna_authors.df[i, ] m <-lm(dv ~ pctrental + pctpublic_w_zsp, data = vienna_authors.df[-i, ]) sqe2[[i]] <- (row$dv -predict(m, row))^2}print(glue::glue("LOO RMSE for main effects: { sqrt(mean(sqe2, na.rm = TRUE)) }"))
LOO RMSE for main effects: 0.0439284411572522
Code
rm(sqe1)rm(sqe2)
Conclusion
We find research is computationally replicable but the evidence for the main causal claims are overstated. While the focus of analysis concentrates on patterns in outcomes that are likely relevant to understanding the underlying data-generating process, the statistical models supporting the causal claims support only a small proportion of overall variance in outcomes. Furthermore, the analysis elides relevant competing models. For example, model-only main effects fit the data nearly as well as those including the interactive term that supports the main causal claim.
The relative weakness of the claim is obscured in the public analysis because neither overall goodness of fit, nor comparison to ‘naive’ / baseline models are include. While conceptual reproducibility analysis would be useful for exploring more complex alternate models this route is obstructed by the absence of citations, documentation and linking codes that would support reliable reanalysis using original data, or augmentation of the authors’ data with additional measures. We conjecture that, as a general practice, research reliability would be increased by including these practices in publication and data sharing.
References
Altman, Micah, Jeff Gill, and Michael McDonald. 2004. Numerical issues in statistical computing for the social scientist. Wiley series in probability and statistics. Hoboken, NJ: Wiley-Interscience.
Cavaille, Charlotte. 2023. “Replication Data for: “How Distributional Conflict over in-Kind Benefits Generates Support for Far-Right Parties.” Harvard Dataverse. https://doi.org/10.7910/DVN/SYNP73.
Cavaillé, Charlotte, and Jeremy Ferwerda. 2023. “How Distributional Conflict over In-Kind Benefits Generates Support for Far-Right Parties.”The Journal of Politics 85 (1): 19–33. https://doi.org/10.1086/720643.
All authors declare that they have no financial support or conflict of interest in this publication.↩︎
Source Code
---title: "altman-demetrio-tarasenko Stockholm 2023 replication"format: htmltoc: TRUEcode-tools: TRUEcode-fold: TRUEembed-resources: TRUEeditor: visualbibliography: references.bib---# **A comment on Cavaillé and Ferwerda (2022)**Micah Altman[^1][^2] (MIT) [escience\@mit.edu](mailto:escience@mit.edu){.email}, Hans Gaebler (Harvard) [jgaebler\@fas.harvard.edu](mailto:jgaebler@fas.harvard.edu){.email}, Georgy Tarasenko (Cornell) [gt298\@cornell.edu](mailto:gt298@cornell.edu){.email}[^1]: Corresponding author.[^2]: All authors declare that they have no financial support or conflict of interest in this publication.## AbstractCavaillé and Ferwerda [@cavaillé2023] examine the effect of granting immigrants access to the welfare state in Austria during 2006 on the support for far-right parties. In their preferred specification, they find that -- the interaction of the proportion of non-eu residents in a municipality with the proportion of population in that municipality has an effect on the change in vote share for far right parties of +0.67 (p \< .05) supporting the main substantive claim that increasing welfare benefits to immigrants increases support for far-right parties. Moderating this, based on a subset of data from Vienna, they find that -- percentage of population in rental housing and the percentage of population in public housing are both substantive and significant predictors of change in far-right party vote share with magnitude of .03 and .09 (respectively) and p-values of \< .05. Based on this estmate they conclude that support was increased by the the high proportion of public housing beneficiaries or low-end rentals within districts.We first reproduce the paper's main and secondary analyses, and subject these to computational robustness tests.We then conduct several types of conceptual replication -- examining the robustness of results to selection of covariates, reweighting, and the sensitivity of results to resampling and outliers.We find research is computationally replicable but the evidence for the main causal claims are overstated as it sensitive to a small set of observations, does not explain most of the variation in outcomes, and does not out-compete simpler causal explanations.## Introduction```{r setup}#| include: false#| echo: falselibrary(tidyverse)library(magrittr,include.only="%<>%")library(broom)library(gt)library(patchwork)tidy_results<-function(resls){ purrr::map(resls, \(x) tidy(x)) %>%list_rbind(names_to="repl_id")}tidy_summary<-function(resls){ purrr::map(resls, \(x) glance(summary(x))) %>%list_rbind(names_to="repl_id")}# workaround absence of tidier for some of the packages usedglance.summary.lm_robust <-function(x,...) { x[c("r.squared","adj.r.squared","df.residual","res_var","nobs")] %>%as_tibble()}glance.summary.glm <-function(x,...) { x[c("aic","df.residual")] %>%as_tibble()}theme_set(theme_bw())```Cavaillé and Ferwerda [@cavaillé2023] tested the effect of granting immigrants access to the welfare state in Austria during 2006 on the support for far-right parties. We focus on their two main claims, as described in the abstract (pg 19):> we show that the **reform increased support for far-right parties with welfare chauvinist platforms**. Electoral ward data suggest that this **response was concentrated in districts with a high proportion of public housing beneficiaries or low-end rentals.** Our findings provide novel evidence that **distributional conflict can accelerate the rise of far-right parties in countries with substantial in-kind welfare programs**In their preferred specification, using OLS with clustered standard errors, they find that -- the interaction of the proportion of non-eu residents in a municipality with the proportion of population in that municipality has an effect on the change in vote share for far right parties of +0.67 (p \< .05) (pg 26, Table 1, col 1) supporting the main substantive claim that increasing welfare benefits to immigrants increases support for far-right parties (pg. 19). Moderating this, based on a subset of data from Vienna, they find that -- percentage of population in rental housing and the percentage of population in public housing are both substantive and significant predictors of change in far-right party vote share with magnitude of .03 and .09 (respectively) and p-values of \< .05. (pg 30, Table 2, col 1). Based on this estimate they conclude that support was accelerated by the the high proportion of public housing beneficiaries or low-end rentals within districts. (pg 19.)We obtained the published replication data set [@Cavaille2023] from the Harvard Dataverse archive, where it had been deposited for public use by the authors. The replication data set included the final processed data used to produce the paper results, along with R code to replicate figures and tables. The data was sufficient to check computational repoducibility but posed challenges for conceptual reproducibility because it provided neither copies of the source data, nor links to or citations of the original data sources (which were described generally). Further although the replication data did include additional measures not used in publication, these were largely undocumented, and hence difficult to reliably interpret.We first reproduced the paper's main and secondary analyses, and subject these to computational robustness tests. The computational reproductions showed both the main and secondary results to be reproducible. As an unintended consequence reproducing the results revealed the substantively poor fit of these models: the primary model has an R-squared of .0356. This finding informed further replication analysis.We then conducted several types of conceptual replication -- examining the robustness of results to selection of covariates, reweighting, and the sensitivity of results to resampling and outliers.Prompted by the unexectedly poor fit of the primary model, we performed a conceptual replication that applied the authors linear model with clustered errors with alternate covariates, and to models that did not include the interaction effects that are central to the papers proposed causal explanation. This reveals that the interaction term does not provide substantial improvement in explanatory power over a more naive baseline.The authors suggest, in their first analysis of nation-wide election data, that voters in municipalities with high proportions of residents living in public housing as well as comparatively high proportions of third-country nationals voted more for far-right parties as a result of the demand shock on public housing. We undertake a conceptual replication of this result using their data on the city of Vienna. In particular, we look at census tracts---as a proxy for neighborhood---to see if tracts with a higher proportion of third-country nationals and higher proportions of individuals living in public housing tend to vote for far-right parties at higher rates. We find that implementing this conceptual replication increases the magnitude of the main point estimate for the interaction between these two factors, but the estimate is no longer statistically significant at the 5% level.The authors fit the models in both of their main analyses at the level of administrative units---i.e., all municipalities are equally weighted in their nation-wide analysis, and all tracts are equally weighted in their analysis of Vienna. However, it is unclear if this weighting is substantively appropriate. To test the robustness of their conclusions to this, we reweight their model so that each administrative unit has weight equal to the number of voters it contains. Implementing this robustness check in their primary analysis increases the magnitude of their main point estimate for the the interaction of the proportion of residents living in public housing with the proportion of third-country nationals residing in a municipality, which remains significant at the 5% level. Implementing this robustness check in their secondary analysis has no effect on the magnitude or the statistical significance of the main point estimates.Next, to better understand the authors main claims, how well their models fit the data, and to search for potential anomalies not evident in low-dimensional summaries, we attempted to visually replicate their primary and secondary analyses. Based on this graphical exploration, we carried out an outlier analysis.We assessed the robustness of the findings from the Austrian sample by examining potential outliers in the key variables. We observed that the previously significant interaction effect between the share of non-EU residents and the share of public housing loses its significance when we exclude abnormally large values of key variables. Covariate valance analysis highlights substantial differences between observations with outliers and those without. When we reevaluate the main argument using the sample of outliers, we find that all the initial effects become notably stronger in both magnitude and significance. This may suggest that the author's primary argument may not generalize to all Austrian districts within the sample.Finally, given the substantive importance placed on the interaction term in the authors' primary analysis, as well as the generally low goodness-of-fit of the models considered, we evaluated the main models of the primary and secondary analyses using out-of-sample tests of predictive accuracy. We find in both cases that the models with and without the interaction term have essentially the same mean squared error, suggesting that the substantive interpretation of the interaction effect may not be appropriate.## Reproducibility### Simple Direct Reproduction (Calibration)As a baseline check of the model, and our understanding of it we conducted a simple computational replication, using the authors' supplied data and code, and compared these to published results.#### Reproduction of Table 1 - Model 1 (Primary Result)```{r main-result-reproduction}#| echo: falserequireNamespace("haven")requireNamespace("estimatr")# main replication data from authorsaustria_authors.df <- haven::read_dta("authors replication materials/Austria_final.dta")# verbatim results from published article for main model# NOTE: p-value represents upper bound of actual p-value, since only a '*' notation used in reportingtidy_m1_original.df <-structure(list(term =c("(Intercept)","dv_pop_01","pct_noneu_06","dv_pop_01:pct_noneu_06" ),estimate =c(0.04, 0.02,-0.02, 0.67),std.error =c(0, 0.03, 0.03, 0.17),p.value =c(0.05, 1, 0.05, 0.05),repl_id =c("original", "original", "original", "original") ),row.names =c(NA,-4L),class ="data.frame" )# reproduce authors model using original code for OLS with 'robust' errors'm1.formula <-formula("d_rr_06 ~ dv_pop_01*pct_noneu_06")results_m1_computational.lmr <- estimatr::lm_robust(m1.formula, data = austria_authors.df)repro_m1.df <-tidy_results(list(comp=results_m1_computational.lmr)) %>%bind_rows(tidy_m1_original.df)repro_m1_summary.df <-tidy_summary(list(comp=results_m1_computational.lmr))repro_m1.df %>%group_by(repl_id) %>%gt()repro_m1_summary.df %>%group_by(repl_id) %>%gt()```#### Reproduction of Table 2 - Model 1 (Secondary Result)```{r secondary-result-reproduction}vienna_authors.df <- haven::read_dta("authors replication materials/vienna_final.dta")m2.formula<-formula("dv ~ (pctrental + pctpublic_w_zsp)")results_m2_computational.lmr <- estimatr::lm_robust(m2.formula, data = vienna_authors.df, clusters = tract_key)tidy_m2_original.df <-structure(list(term =c("(Intercept)","pctrental","pctpublic_w_zsp" ),estimate =c(0.04, 0.03, 0.09),std.error =c(0.01, 0.01, 0.01),p.value =c(0.05, 0.05, 0.05),repl_id =c("original", "original", "original") ),row.names =c(NA,-3L),class ="data.frame" )repro_m2.df <-tidy_results(list(comp=results_m2_computational.lmr)) %>%bind_rows(tidy_m2_original.df)repro_m2_summary.df <-tidy_summary(list(comp=results_m2_computational.lmr))repro_m2.df %>%group_by(repl_id) %>%gt()repro_m2_summary.df %>%group_by(repl_id) %>%gt()```#### Summary - Direct ReproducibilityThe computational reproductions showed both the main and secondary results to be reproducible.As an unintended consequence, reproducing the results revealed the substantively poor fit of these models: the primary model has an R-squared of .0356. This finding informed the replication analysis.### Computational robustnessGenerally, even with a fixed statistical model family and specification results may vary with the estimation algorithm used and specific software's implementation of it. [@altman2004] We evaluate the computational robustness of the model by using alternative algorithms and implementations.#### Primary Result```{r alternate-estimations-m1}requireNamespace("arm")results_m1_statlm.lm <- stats::lm(m1.formula, data = austria_authors.df)results_m1_statglm.glm <- stats::glm(m1.formula, data = austria_authors.df)results_m1_statbayeslm.glm <- arm::bayesglm(m1.formula, data = austria_authors.df, prior.scale=Inf, prior.df=Inf)#Note: could also add nls, bayesglm, mle2 -- would require re-expressing current formula in different syntax, and renaming results matrixml.ls <-list(lmrobust = results_m1_computational.lmr,lm=results_m1_statlm.lm,glm=results_m1_statglm.glm,bayes=results_m1_statbayeslm.glm)alt_est_m1.df <-tidy_results(ml.ls)alt_est_m1_summary.df <-tidy_summary(ml.ls)alt_est_m1.df %>%group_by(repl_id) %>%gt()alt_est_m1_summary.df %>%group_by(repl_id) %>%gt()rm(ml.ls,results_m1_statlm.lm,results_m1_statglm.glm,results_m1_statbayeslm.glm)```#### Secondary Result```{r alternate-estimations-m2}requireNamespace("arm")results_m2_statlm.lm <- stats::lm(m2.formula, data = vienna_authors.df)results_m2_statglm.glm <- stats::glm(m2.formula, data = vienna_authors.df)results_m2_statbayeslm.glm <- arm::bayesglm(m2.formula, data = vienna_authors.df, prior.scale=Inf, prior.df=Inf)#Note: could also add nls, bayesglm, mle2 -- would require re-expressing current formula in different syntax, and renaming results matrixml.ls <-list(lmrobust = results_m2_computational.lmr,lm=results_m2_statlm.lm,glm=results_m2_statglm.glm,bayes=results_m2_statbayeslm.glm)alt_est_m2.df <-tidy_results(ml.ls)alt_est_m2_summary.df <-tidy_summary(ml.ls)alt_est_m2.df %>%group_by(repl_id) %>%gt()alt_est_m2_summary.df %>%group_by(repl_id) %>%gt()rm(ml.ls,results_m2_statlm.lm,results_m2_statglm.glm,results_m2_statbayeslm.glm)```#### Summary - Computation Robustness of ReproducibilityThe computational reproductions showed both the main and secondary results to be robust to alternative choices of algorithm and software implementation.## Replication:### Conceptual Replication: Alternate OLS Specification (covariate robustness)Prompted by the unexectedly poor fit of the primary model, we performed a conceptual replication that applied the authors linear model with clustered errors with alternate covariates, and to models that did not include the interaction effects that are central to the papers proposed causal explanation.#### Alternate Covariates & Model 1```{r alternate-variables-m1}#NOTE: could also explore alternate methods for computing robust standard errors (e.g. sensemaker, lmtest, sandwich)results_m1_clustered.lmr <- estimatr::lm_robust(m1.formula, data = austria_authors.df, clusters = bezirk)results_m1_interactionsonly.lmr <- estimatr::lm_robust(formula("d_rr_06 ~ dv_pop_01:pct_noneu_06 -dv_pop_01 -pct_noneu_06"),data = austria_authors.df)results_m1_kitchensink.lmr <- estimatr::lm_robust(formula("d_rr_06 ~ dv_pop_01:pct_noneu_06 +educ_tertiary +avg_income +lab_pct_manufact_01 +lab_pct_unemp +welfare_cap_06 +health_cap_06 +education_cap_06 +foreignborn_delta+citizen_eu_growth_pct +vacancy_01_public" ),data = austria_authors.df )results_m1_kitchensinkmain.lmr <- estimatr::lm_robust(formula("d_rr_06 ~ dv_pop_01+pct_noneu_06 +educ_tertiary +avg_income +lab_pct_manufact_01 +lab_pct_unemp +welfare_cap_06 +health_cap_06 +education_cap_06 +foreignborn_delta+citizen_eu_growth_pct +vacancy_01_public" ),data = austria_authors.df )# vacancy has high missingbessresults_m1_kitchensink_novacancy.lmr <- estimatr::lm_robust(formula("d_rr_06 ~ dv_pop_01:pct_noneu_06 +educ_tertiary +avg_income +lab_pct_manufact_01 +lab_pct_unemp +welfare_cap_06 +health_cap_06 +education_cap_06 +foreignborn_delta+citizen_eu_growth_pct " ),data = austria_authors.df )results_m1_kitchensinkmain_novacancy.lmr <- estimatr::lm_robust(formula("d_rr_06 ~ dv_pop_01+pct_noneu_06 +educ_tertiary +avg_income +lab_pct_manufact_01 +lab_pct_unemp +welfare_cap_06 +health_cap_06 +education_cap_06 +foreignborn_delta+citizen_eu_growth_pct " ),data = austria_authors.df )results_m1_mainonly.lmr <- estimatr::lm_robust(formula("d_rr_06 ~ dv_pop_01 + pct_noneu_06 "), data = austria_authors.df)results_m1_euonly.lmr <- estimatr::lm_robust(formula("d_rr_06 ~ pct_noneu_06 "), data = austria_authors.df)results_m1_poponly.lmr <- estimatr::lm_robust(formula("d_rr_06 ~ dv_pop_01 "), data = austria_authors.df)ml.ls <-list(author_model1 = results_m1_computational.lmr,author_clusterederr = results_m1_clustered.lmr,loaded_model = results_m1_kitchensink.lmr,loaded_main = results_m1_kitchensinkmain.lmr,loaded_nv = results_m1_kitchensink_novacancy.lmr,loaded_main_nv = results_m1_kitchensinkmain_novacancy.lmr,interaction = results_m1_interactionsonly.lmr,maineffects = results_m1_mainonly.lmr,single_eu = results_m1_euonly.lmr,single_pop = results_m1_poponly.lmr)alt_var_m1.df <-tidy_results(ml.ls)alt_var_m1_summary.df <-tidy_summary(ml.ls)alt_var_m1.df %>%group_by(repl_id) %>%gt()alt_var_m1_summary.df %>%group_by(repl_id) %>%gt()rm( ml.ls, results_m1_clustered.lmr, results_m1_kitchensink.lmr, results_m1_interactionsonly.lmr, results_m1_kitchensink_novacancy.lmr, results_m1_kitchensinkmain_novacancy.lmr, results_m1_mainonly.lmr, results_m1_euonly.lmr, results_m1_poponly.lmr, results_m1_kitchensinkmain.lmr)```Observe that none of the alternative models fit the data well. Moreover, models without interaction effects provide only slightly lower explanatory power.#### Alternate Covariates & Model 2```{r alternate-variables-m2}#m2.formula<- formula("dv ~ (pctrental + pctpublic_w_zsp)")#results_m2_computational.lmr <-# estimatr::lm_robust(m2.formula, data = vienna_authors.df, # clusters = tract_key)# m1.formula <- formula("d_rr_06 ~ dv_pop_01*pct_noneu_06")results_m2_interaction.lmr <- estimatr::lm_robust(formula("dv ~ pctrental*pctpublic_w_zsp"),data = vienna_authors.df,clusters = tract_key)results_m2_m1model.lmr <- estimatr::lm_robust(formula("dv ~ pctpublic_w_zsp*pctforeign"),data = vienna_authors.df,clusters = tract_key)results_m2_rental.lmr <- estimatr::lm_robust(formula("dv ~ pctrental"),data = vienna_authors.df,clusters = tract_key)results_m2_housing.lmr <- estimatr::lm_robust(formula("dv ~ pctpublic_w_zsp"),data = vienna_authors.df,clusters = tract_key)ml.ls <-list(author_model2 = results_m2_computational.lmr,interaction = results_m2_interaction.lmr,m1spec = results_m2_m1model.lmr,housing_only = results_m2_housing.lmr,rental_only = results_m2_rental.lmr )alt_var_m2.df <-tidy_results(ml.ls)alt_var_m2_summary.df <-tidy_summary(ml.ls)alt_var_m2.df %>%group_by(repl_id) %>%gt()alt_var_m2_summary.df %>%group_by(repl_id) %>%gt()rm( ml.ls, results_m2_interaction.lmr, results_m2_m1model.lmr, results_m2_housing.lmr, results_m2_rental.lmr)```Observe as withthe primary finding that none of the alternative models fit the data well, and that models without interaction effects provide only slightly lower explanatory power. Further, when the orignal model applied to the country as a whole (as above) for the main findingt is fit to this subset, the interaction term is not significant.#### Proportion of third-country nationalsIn light of the eye tests below, we investigate the counterintuitive relationship between the proportion of third-country nationals living in a particular ward and the change in far-right vote share. In particular, if the mechanism driving change support for far-right parties is direct competition between third-country and Austrian nationals for public housing, then, assuming individuals have a preference for continuing to live near their current residence, we would expect that voters living in areas with more third-country nationals would tend to be more supportive of far-right parties.```{r vienna-mechanism1}results_m2_pctforeign <- estimatr::lm_robust(dv ~ pctforeign * pctpublic_w_zsp, data = vienna_authors.df,clusters = tract_key)tidy_results(list("pctforeign"= results_m2_pctforeign)) %>%gt()tidy_summary(list("pctforeign"= results_m2_pctforeign)) %>%gt()results_m2_pctforeign_w_controls <- estimatr::lm_robust(dv ~ pctforeign * pctpublic_w_zsp + lab_pct_pensioners + educ_tertiary,data = vienna_authors.df, clusters = tract_key)tidy_results(list("pctforeign_controls"= results_m2_pctforeign_w_controls)) %>%gt()tidy_summary(list("pctforeign_controls"= results_m2_pctforeign_w_controls)) %>%gt()```It's possible that these results are driven by homophily, i.e., voters residing in tracts with higher percentages of third-country nationals have political preferences that are more friendly to third-country nationals. To test this, we see if voters living in areas with few third-country nationals were also more likely to vote for far-right parties in 2002. To ensure that the controls make sense, we reverse-engineer `pctforeign02`.```{r vienna-mechanism2}placebo_df <- vienna_authors.df %>%mutate(pctforeign02 = pctforeign / (1+ pctforeign_delta))results_m2_pctforeign_placebo <- estimatr::lm_robust(farright_share2002 ~ pctforeign02 * pctpublic_w_zsp, data = placebo_df,clusters = tract_key)tidy_results(list("pctforeign_placebo"= results_m2_pctforeign_placebo)) %>%gt()tidy_summary(list("pctforeign_placebo"= results_m2_pctforeign_placebo)) %>%gt()results_m2_pctforeign_w_controls_placebo <- estimatr::lm_robust(farright_share2002 ~ pctforeign02 * pctpublic_w_zsp + lab_pct_pensioners + educ_tertiary,data = placebo_df, clusters = tract_key)tidy_results(list("pctforeign_controls_placebo"= results_m2_pctforeign_w_controls_placebo)) %>%gt()tidy_summary(list("pctforeign_controls_placebo"= results_m2_pctforeign_w_controls_placebo)) %>%gt()rm(placebo_df)```We find the opposite---namely, that the percentage of third-country nationals is, if anything, weakly *positively* associated with support for far-right parties in previous elections, which casts doubt on the hypothesis that far-right voters are residentially segregated from third-country nationals.This suggests that the evidence from Vienna specifically for the proposed mechanism---namely, that direct competition with third-country nationals for public housing resources pushes voters to support far-right parties---is weak.#### ReweightingIf we had individual rather than aggregate data, we would most likely fit a model at the voter level, regressing whether or not an individual cast a vote for a far-right party in the 2006 federal election against whether they lived in public housing, the number of third-country nationals living in their neighborhood, etc. This paper uses aggregate data at the tract and municipality level as an approximation to the individual data. However, unless we weight the regression by the number of voters in a municipality, we are implicitly weighting voters differently in different municipalities or tracts. As a result, we rerun both models reweighting the observations by the number of voters in the administrative unit.```{r reweighting1}results_m1_reweight <- estimatr::lm_robust( d_rr_06 ~ dv_pop_01 * pct_noneu_06, data = austria_authors.df,weights = registered_06)repro_m1_reweight.df <-tidy_results(list(comp = results_m1_reweight)) %>%bind_rows(tidy_m1_original.df)repro_m1_reweight_summary.df <-tidy_summary(list(comp = results_m1_reweight))repro_m1_reweight.df %>%group_by(repl_id) %>%gt()repro_m1_reweight_summary.df %>%group_by(repl_id) %>%gt()```We see that both effects become more pronounced---first, that municipalities with higher numbers of third-country residents are substantially less likely to vote for far-right parties, and second, that municipalities with high proportions of people living in public housing are more likely to vote for far-right parties.We can repeat the same analysis at the level of Vienna tracts for the second analysis.```{r reweighting2}results_m2_reweight <- estimatr::lm_robust(dv ~ pctrental + pctpublic_w_zsp, data = vienna_authors.df,weights =exp(log_voters), clusters = tract_key)repro_m2_reweight.df <-tidy_results(list(comp = results_m2_reweight)) %>%bind_rows(tidy_m2_original.df)repro_m2_reweight_summary.df <-tidy_summary(list(comp = results_m2_reweight))repro_m2_reweight.df %>%group_by(repl_id) %>%gt()repro_m2_reweight_summary.df %>%group_by(repl_id) %>%gt()```In this case, the weighting makes almost no difference to the model results.We can also perform our alternative specification, where we adjust for the percentage of third-country residents in the Vienna analysis, with reweighting.```{r reweighting3}results_m2_pctforeign_reweight <- estimatr::lm_robust(dv ~ pctforeign * pctpublic_w_zsp, data = vienna_authors.df,weights =exp(log_voters), clusters = tract_key)tidy_results(list(results_m2_pctforeign_reweight)) %>%gt()tidy_summary(list(results_m2_pctforeign_reweight)) %>%gt()```Again, this doesn't seem to meaningfully affect the results.### Data Robustness Analysis#### Graphical Exploration of Data (intraoccular impact)To better understand the authors main claims, we attempt to visualize the trends they find in the raw data. We begin with the first analysis, which links support for far-right political parties in the 2006 federal election with the proportion of third-country nationals across Austrian municipalities.```{r intraoccular-1}requireNamespace("plotly")## AUSTRIA# Plot the relationship between % non-EU and change in vote sharesuppressWarnings({ austria_authors.df %>%ggplot(aes(x = pct_noneu_06, y = d_rr_06)) +geom_point() +labs(x ="% non-EU residents in municipality",y ="Change in far-right vote share") +# NOTE: There are ~200 municipalities with 0 non-EU residentsscale_x_log10() +geom_smooth(method = lm)} %>% plotly::ggplotly())# Plot the relationship between % non-EU and vote sharesuppressWarnings({ austria_authors.df %>%ggplot(aes(x = pct_noneu_06, y = rr_share_06)) +geom_point() +labs(x ="% non-EU residents in municipality",y ="Far-right vote share") +# NOTE: There are ~200 municipalities with 0 non-EU residentsscale_x_log10() +geom_smooth(method = lm)} %>% plotly::ggplotly())# Compare vote share in 2002 and 2006suppressWarnings({ austria_authors.df %>%select(pct_noneu_06, rr_share_06, rr_share_02) %>%pivot_longer(cols =starts_with("rr_share"),names_prefix ="rr_share_",names_to ="year",values_to ="vote_share" ) %>%ggplot(aes(x = pct_noneu_06, y = vote_share)) +geom_point() +labs(x ="% non-EU residents in municipality",y ="Far-right vote share") +# NOTE: There are ~200 municipalities with 0 non-EU residentsscale_x_log10() +geom_smooth(method = lm) +facet_wrap(vars(year))} %>% plotly::ggplotly())```Visual examination of the data indicates that (1) there is a *very* modest association between the percentage of third-country nationals and both the level of support and the *change* in the level of support for far-right parties in the 2006 elections; and (2) that this trend is more pronounced in 2006 than in 2002. One of this article's primary interests is, in addition, the interaction between these two factors---i.e., that support is driven by competition between Austrian and third-country nationals for housing, which is most accute when public housing rates and the proportion of the population that is third-country nationals are both high. To visualize this, we plot the relationship between the percentage of third-country nationals and support for far-right parties, stratifying by the percentage of the population that lives in public housing.```{r intraoccular-2}# Bin the percentage of the municipality in public housing and plot the# change in vote share by % non-EU residentscuts <-with(austria_authors.df,quantile(dv_pop_01, probs =seq(0, 1, 1/3),na.rm =TRUE))suppressWarnings({ austria_authors.df %>%mutate(pct_public_housing =cut(dv_pop_01, breaks = cuts, include.lowest =TRUE)) %>%# ~ 13 municipalities don't have data on public housingdrop_na(pct_public_housing) %>%ggplot(aes(x = pct_noneu_06, y = d_rr_06)) +geom_point() +labs(x ="% non-EU residents in municipality",y ="Change in far-right vote share") +# NOTE: There are ~200 municipalities with 0 non-EU residentsscale_x_log10() +geom_smooth(method = lm) +facet_wrap(vars(pct_public_housing))} %>% plotly::ggplotly())```We see that while the trend line does appear to get steeper, it is estimated with a considerable amount of uncertainty; in particular, in all three cases, the trend is not visually apparent, and the slope of the fitted linear model has a high degree of uncertainty about sign.We additionally color the observations based on the baseline level of support for far-right parties to see if an interaction with public housing rates and proportion of third-country nationals is apparent. In particular, it seems plausible that these relationships might be strengthened in places where the baseline support for far-right parties (as measured by their support in 2002) is already high.```{r intraoccular-3}# Add coloring based on how far-right the municipality was in the previous# electionsuppressWarnings({ austria_authors.df %>%mutate(pct_public_housing =cut(dv_pop_01, breaks = cuts, include.lowest =TRUE)) %>%# ~ 13 municipalities don't have data on public housingdrop_na(pct_public_housing) %>%ggplot(aes(x = pct_noneu_06, y = d_rr_06, color = rr_share_02)) +geom_point() +labs(x ="% non-EU residents in municipality",y ="Change in far-right vote share",color ="% far-right in\nprevious election") +# NOTE: There are ~200 municipalities with 0 non-EU residentsscale_x_log10() +scale_color_gradient(low ="blue", high ="red") +geom_smooth(method = lm) +facet_wrap(vars(pct_public_housing))} %>% plotly::ggplotly())```While some extreme outliers seem to be in places that were already far-right in 2002, no trend is immediately apparent.Next, we consider the second analysis, which tries to more directly test whether direct competition for public housing between Austrian and third-country nationals explains increasing support for far-right parties in Vienna.We begin by visualizing the relationship between the proportion of third-country nationals living in a tract, the proportion of residents living in public housing in a tract, and change in far-right vote share.```{r inraoccular-4}## VIENNA# Plot the relationship between % non-EU and change in vote sharesuppressWarnings({ vienna_authors.df %>%ggplot(aes(x = pctforeign, y = dv)) +geom_point() +labs(x ="% non-EU residents in tract",y ="Change in far-right vote share") +# NOTE: There are ~15 tracts with no foreign residentsscale_x_log10() +geom_smooth(method = lm)} %>% plotly::ggplotly() )# Plot the relationship between % in public housing and change in vote sharesuppressWarnings({ vienna_authors.df %>%ggplot(aes(x = pctpublic_w_zsp, y = dv)) +geom_point() +labs(x ="% of residents in public housing",y ="Change in far-right vote share") +# NOTE: There are ~900 tracts with no one in public housingscale_x_log10() +geom_smooth(method = lm) } %>% plotly::ggplotly())```We notice two important facts. First, the relationship between the percentage of individuals living in public housing and the change in far-right vote share is much more pronounced and positive in Vienna than in the national data. Second, the relationship between the percentage of third-country nationals and the far-right vote share is actually very pronounced and negative. To understand the interrelationship between these two factors, we stratify by the proportion of individuals living in public housing and plot the relationship between the percentage of third-country nationals and change in far-right vote share, and see that while the rate of support does seem to rise in the highest bin (i.e., those tracts where the largest number of people live in public housing), the trend within each bin remains fairly negative.```{r intraoccular-4}# Plot the relationship between % non-EU and change in vote share, stratified by# the rate of public housingcuts <-with(vienna_authors.df,quantile( pctpublic_w_zsp,probs =c(0, 1/2, 3/4, 1),na.rm =TRUE ))suppressWarnings({ vienna_authors.df %>%mutate(public_housing =cut(pctpublic_w_zsp, cuts, include.lowest =TRUE)) %>%ggplot(aes(x = pctforeign, y = dv)) +geom_point() +labs(x ="% non-EU residents in tract",y ="Change in far-right vote share") +# NOTE: There are ~15 tracts with no foreign residentsscale_x_log10() +geom_smooth(method = lm) +facet_wrap(vars(public_housing))} %>% plotly::ggplotly())rm(cuts)```#### Outlier analysisFirst of all, we can have a look at the outliers in the main variables from Table 1 (Austrian sample). The following code visualizes the relationships between the main variables of interest: the percentage of non-EU residents, the percentage of people living in public housing, and the change in far-right vote share. From each of these variables, we drop 1% of the largest and smallest values. And then the plots compare the change in linear dependence between the variables.```{r outliers_plots}library(patchwork)###### OUTLIERS PLOTSp1 <- austria_authors.df %>%mutate(outlier =ifelse( ( dv_pop_01 >quantile(dv_pop_01, 0.99, na.rm = T) | d_rr_06 >quantile(d_rr_06, 0.99, na.rm = T) | dv_pop_01 <quantile(dv_pop_01, 0.01, na.rm = T) | d_rr_06 <quantile(d_rr_06, 0.01, na.rm = T) ), T, F )) %>%ggplot(aes(x = dv_pop_01, y = d_rr_06)) +geom_point(aes(color = outlier), size =1) +scale_color_manual(values =c('navy', 'red')) +ggtitle('') +ylab('Δ 2002–6') +xlab('% public housing') +geom_smooth(method ="lm",se = T,color ="red") +geom_smooth(data = . %>%filter(outlier == F),method ="lm",se = T,color ="blue" ) +geom_text(data = . %>%summarise(r2 =summary(lm(d_rr_06 ~ dv_pop_01))$r.squared),aes(label =paste("R^2 =", round(r2, 3)),x =0.4,y =0.2 ),color ="red",hjust =0 ) +geom_text(data = . %>%filter(outlier ==0) %>%summarise(r2 =summary(lm(d_rr_06 ~ dv_pop_01))$r.squared),aes(label =paste("R^2 =", round(r2, 3)),x =0.4,y =0.18 ),color ="blue",hjust =0 )p2 <- austria_authors.df %>%mutate(outlier =ifelse( ( pct_noneu_06 >quantile(pct_noneu_06, 0.99, na.rm = T) | d_rr_06 >quantile(d_rr_06, 0.99, na.rm = T) | pct_noneu_06 <quantile(pct_noneu_06, 0.01, na.rm = T) | d_rr_06 <quantile(d_rr_06, 0.01, na.rm = T) ), T, F )) %>%ggplot(aes(x = pct_noneu_06, y = d_rr_06)) +geom_point(aes(color = outlier), size =1) +scale_color_manual(values =c('navy', 'red')) +ggtitle('') +ylab('Δ 2002–6') +xlab('% non-EU') +geom_smooth(method ="lm",se = T,color ="red") +geom_smooth(data = . %>%filter(outlier == F),method ="lm",se = T,color ="blue" ) +geom_text(data = . %>%summarise(r2 =summary(lm( d_rr_06 ~ pct_noneu_06 ))$r.squared),aes(label =paste("R^2 =", round(r2, 3)),x =0.15,y =0.2 ),color ="red",hjust =0 ) +geom_text(data = . %>%filter(outlier ==0) %>%summarise(r2 =summary(lm( d_rr_06 ~ pct_noneu_06 ))$r.squared),aes(label =paste("R^2 =", round(r2, 3)),x =0.15,y =0.18 ),color ="blue",hjust =0 )p3 <- austria_authors.df %>%mutate(outlier =ifelse( ( dv_pop_01 >quantile(dv_pop_01, 0.99, na.rm =TRUE) | d_rr_06 >quantile(d_rr_06, 0.99, na.rm =TRUE) | dv_pop_01 <quantile(dv_pop_01, 0.01, na.rm =TRUE) | d_rr_06 <quantile(d_rr_06, 0.01, na.rm =TRUE) | pct_noneu_06 >quantile(pct_noneu_06, 0.99, na.rm =TRUE) | d_rr_06 >quantile(d_rr_06, 0.99, na.rm =TRUE) | pct_noneu_06 <quantile(pct_noneu_06, 0.01, na.rm =TRUE) | d_rr_06 <quantile(d_rr_06, 0.01, na.rm =TRUE) ),TRUE,FALSE )) %>%ggplot(aes(x =I(pct_noneu_06 * pct_noneu_06), y = d_rr_06)) +geom_point(aes(color = outlier), size =1) +scale_color_manual(values =c('navy', 'red')) +ggtitle('') +# Add your plot titlegeom_smooth(method ="lm",se = T,color ="red") +# Add smoothed line for the whole samplegeom_smooth(data = . %>%filter(outlier == F),method ="lm",se = T,color ="blue" ) +geom_text(data = . %>%summarise(r2 =summary(lm( d_rr_06 ~I(pct_noneu_06 * pct_noneu_06) ))$r.squared),aes(label =paste("R^2 =", round(r2, 3)),x =0.04,y =0.2 ),color ="red",hjust =0 ) +geom_text(data = . %>%filter(outlier ==0) %>%summarise(r2 =summary(lm( d_rr_06 ~I(pct_noneu_06 * pct_noneu_06) ))$r.squared),aes(label =paste("R^2 =", round(r2, 3)),x =0.04,y =0.18 ),color ="blue",hjust =0 )suppressWarnings(print({p1+p2+p3}))rm(p1,p2,p3)```From these plots, we can observe that after removing outliers, the R-squared values decrease significantly. Respectively, it drops from 0.028 to 0.029 (a 32% decrease), from 0.013 to 0.010 (a 23% decrease), and from 0.008 to 0.003 for the key interaction (a 62.5% decrease).Now, we replicate the baseline model from Table 1 (regression of far-right vote change on public housing, non-EU and residents and their interaction).```{r outliers_reg1}library("lfe")library("gtsummary")austria_authors.df_upd <- austria_authors.df %>%mutate(outlier98 =ifelse( (dv_pop_01 >quantile(dv_pop_01, 0.99, na.rm =TRUE) | d_rr_06 >quantile(d_rr_06, 0.99, na.rm =TRUE) | dv_pop_01 <quantile(dv_pop_01, 0.01, na.rm =TRUE) | d_rr_06 <quantile(d_rr_06, 0.01, na.rm =TRUE) | pct_noneu_06 >quantile(pct_noneu_06, 0.99, na.rm =TRUE) | d_rr_06 >quantile(d_rr_06, 0.99, na.rm =TRUE) | pct_noneu_06 <quantile(pct_noneu_06, 0.01, na.rm =TRUE) | d_rr_06 <quantile(d_rr_06, 0.01, na.rm =TRUE)),TRUE, FALSE),outlier98_up =ifelse( (dv_pop_01 >quantile(dv_pop_01, 0.99, na.rm =TRUE) | pct_noneu_06 >quantile(pct_noneu_06, 0.99, na.rm =TRUE) | d_rr_06 >quantile(d_rr_06, 0.99, na.rm =TRUE)), TRUE, FALSE),outlier98_down =ifelse( (dv_pop_01 <quantile(dv_pop_01, 0.01, na.rm =TRUE) | pct_noneu_06 <quantile(pct_noneu_06, 0.01, na.rm =TRUE)| d_rr_06 <quantile(d_rr_06, 0.01, na.rm =TRUE)), TRUE, FALSE) )ols1 <-felm(d_rr_06 ~ dv_pop_01*pct_noneu_06|0|0, data =austria_authors.df) %>%tbl_regression(tidy_fun = purrr::partial(tidy_robust, robust ="HC1"))%>%add_significance_stars(hide_p = F, hide_se = F)%>% gtsummary::add_glance_table(include =c(nobs, r.squared))ols2 <-felm(d_rr_06 ~ dv_pop_01*pct_noneu_06|0|0, data =subset(austria_authors.df_upd, outlier98 ==FALSE)) %>%tbl_regression()%>%add_significance_stars(hide_p = F, hide_se = F)%>%add_glance_table(include =c(nobs, r.squared)) ols3 <-felm(d_rr_06 ~ dv_pop_01*pct_noneu_06|0|0, data =subset(austria_authors.df_upd, outlier98_up ==FALSE)) %>%tbl_regression()%>%add_significance_stars(hide_p = F, hide_se = F)%>%add_glance_table(include =c(nobs, r.squared))ols4 <-felm(d_rr_06 ~ dv_pop_01*pct_noneu_06|0|0, data =subset(austria_authors.df_upd, outlier98_down ==FALSE)) %>%tbl_regression()%>%add_significance_stars(hide_p = F, hide_se = F)%>%add_glance_table(include =c(nobs, r.squared))tbl_merge_ex1 <-tbl_merge(tbls =list(ols1, ols2, ols3, ols4),tab_spanner =c("**Original Specification**", "1% Min-Max Dropped", "1% Max Dropped", "1% Min Dropped") )tbl_merge_ex1```We can see that the initial baseline model is reproducible in terms of effect sizes and standard errors. But once we change subsamples, results change as well. We can see that the exclusion of 1% highest values for all three key variables is related to the change in the estimate for the interaction term: coefficient drops to 0.47 while p-value = 0.091 compared to the p-value \<0.001 in the baseline model.Next, we try to cluster errors by district (bezirk). For the initial sample, the result does not change drastically (the p-value for the interaction term increases to 0.006). Yet, the significance of the term when the highest values are dropped reach even higher p = 0.2).```{r outliers_reg2}rols1 <-felm(d_rr_06 ~ dv_pop_01*pct_noneu_06|0|0|bezirk, data =austria_authors.df) %>%tbl_regression(tidy_fun = purrr::partial(tidy_robust, robust ="HC1"))%>%add_significance_stars(hide_p = F, hide_se = F)%>%add_glance_table(include =c(nobs, r.squared))rols2 <-felm(d_rr_06 ~ dv_pop_01*pct_noneu_06|0|0|bezirk, data =subset(austria_authors.df_upd, outlier98 ==FALSE)) %>%tbl_regression()%>%add_significance_stars(hide_p = F, hide_se = F)%>%add_glance_table(include =c(nobs, r.squared)) rols3 <-felm(d_rr_06 ~ dv_pop_01*pct_noneu_06|0|0|bezirk, data =subset(austria_authors.df_upd, outlier98_up ==FALSE)) %>%tbl_regression()%>%add_significance_stars(hide_p = F, hide_se = F)%>%add_glance_table(include =c(nobs, r.squared))rols4 <-felm(d_rr_06 ~ dv_pop_01*pct_noneu_06|0|0|bezirk, data =subset(austria_authors.df_upd, outlier98_down ==FALSE)) %>%tbl_regression()%>%add_significance_stars(hide_p = F, hide_se = F)%>%add_glance_table(include =c(nobs, r.squared))tbl_merge_ex1 <-tbl_merge(tbls =list(rols1, rols2, rols3, rols4),tab_spanner =c("**Original Specification**", "1% Min-Max Dropped", "1% Max Dropped", "1% Min Dropped") )tbl_merge_ex1```Here we look at the covariate balance between outliers and non-outlier observations.```{r outliers_cov}library("cobalt")library("MatchIt")austria_authors.df_upd1 <-subset(austria_authors.df_upd, outlier98_up ==TRUE| outlier98_up ==FALSE)austria_authors.df_upd1 <- austria_authors.df_upd1 %>%filter(!is.na(educ_tertiary) &!is.na(avg_income) &!is.na(lab_pct_manufact_01) &!is.na(lab_pct_unemp) &!is.na(welfare_cap_06 ) &!is.na( health_cap_06) &!is.na(education_cap_06) &!is.na(foreignborn_delta))set.seed(7)m.out <- MatchIt::matchit(outlier98_up ~ educ_tertiary + registered_06+ avg_income + lab_pct_manufact_01 + lab_pct_unemp + welfare_cap_06 + health_cap_06 + education_cap_06 + foreignborn_delta, data = austria_authors.df_upd1)bal.tab(m.out, thresholds =c(m = .1), un =TRUE)love.plot(m.out, stats =c("mean.diffs"),thresholds =c(m = .1, v =2), abs =TRUE, binary ="std",var.order ="unadjusted")```And we replicate the authors main specification on the sample of outliers:```{r}out1 <-felm(d_rr_06 ~ dv_pop_01*pct_noneu_06|0|0|bezirk, data =subset(austria_authors.df_upd, outlier98_up ==TRUE)) %>%tbl_regression()%>%add_significance_stars(hide_p = F, hide_se = F)%>%add_glance_table(include =c(nobs, r.squared))tbl_merge_ex2 <-tbl_merge(tbls =list(out1),tab_spanner =c("") )tbl_merge_ex2 ```### Model fit```{r}sqe1 <-double(nrow(austria_authors.df))for (i inseq(nrow(austria_authors.df))) { row <- austria_authors.df[i, ] m <-lm(d_rr_06 ~ dv_pop_01 * pct_noneu_06, data = austria_authors.df[-i, ]) sqe1[[i]] <- (row$d_rr_06 -predict(m, row))^2}print(glue::glue("LOO RMSE for interaction mode: { sqrt(mean(sqe1, na.rm = TRUE)) }"))sqe2 <-double(nrow(austria_authors.df))for (i inseq(nrow(austria_authors.df))) { row <- austria_authors.df[i, ] m <-lm(d_rr_06 ~ dv_pop_01 + pct_noneu_06, data = austria_authors.df[-i, ]) sqe2[[i]] <- (row$d_rr_06 -predict(m, row))^2}print(glue::glue("LOO RMSE for main effects: { sqrt(mean(sqe2, na.rm = TRUE)) }"))rm(sqe1)rm(sqe2)``````{r}sqe1 <-double(nrow(vienna_authors.df))for (i inseq(nrow(vienna_authors.df))) { row <- vienna_authors.df[i, ] m <-lm(dv ~ pctrental * pctpublic_w_zsp, data = vienna_authors.df[-i, ]) sqe1[[i]] <- (row$dv -predict(m, row))^2}print(glue::glue("LOO RMSE for interaction mode: { sqrt(mean(sqe1, na.rm = TRUE)) }"))sqe2 <-double(nrow(vienna_authors.df))for (i inseq(nrow(vienna_authors.df))) { row <- vienna_authors.df[i, ] m <-lm(dv ~ pctrental + pctpublic_w_zsp, data = vienna_authors.df[-i, ]) sqe2[[i]] <- (row$dv -predict(m, row))^2}print(glue::glue("LOO RMSE for main effects: { sqrt(mean(sqe2, na.rm = TRUE)) }"))rm(sqe1)rm(sqe2)```## ConclusionWe find research is computationally replicable but the evidence for the main causal claims are overstated. While the focus of analysis concentrates on patterns in outcomes that are likely relevant to understanding the underlying data-generating process, the statistical models supporting the causal claims support only a small proportion of overall variance in outcomes. Furthermore, the analysis elides relevant competing models. For example, model-only main effects fit the data nearly as well as those including the interactive term that supports the main causal claim.The relative weakness of the claim is obscured in the public analysis because neither overall goodness of fit, nor comparison to 'naive' / baseline models are include. While conceptual reproducibility analysis would be useful for exploring more complex alternate models this route is obstructed by the absence of citations, documentation and linking codes that would support reliable reanalysis using original data, or augmentation of the authors' data with additional measures. We conjecture that, as a general practice, research reliability would be increased by including these practices in publication and data sharing.